Sympathy Begins with a Smile, Intelligence Begins with a Word: Use of Multimodal Features in Spoken Human-Robot Interaction
نویسندگان
چکیده
Recognition of social signals, from human facial expressions or prosody of speech, is a popular research topic in human-robot interaction studies. There is also a long line of research in the spoken dialogue community that investigates user satisfaction in relation to dialogue characteristics. However, very little research relates a combination of multimodal social signals and language features detected during spoken face-to-face human-robot interaction to the resulting user perception of a robot. In this paper we show how different emotional facial expressions of human users, in combination with prosodic characteristics of human speech and features of human-robot dialogue, correlate with users’ impressions of the robot after a conversation. We find that happiness in the user’s recognised facial expression strongly correlates with likeability of a robot, while dialogue-related features (such as number of human turns or number of sentences per robot utterance) correlate with perceiving a robot as intelligent. In addition, we show that facial expression, emotional features, and prosody are better predictors of human ratings related to perceived robot likeability and anthropomorphism, while linguistic and non-linguistic features more often predict perceived robot intelligence and interpretability. As such, these characteristics may in future be used as an online reward signal for in-situ Reinforcement Learningbased adaptive human-robot dialogue systems. Figure 1: Left: a live view of experimental setup showing a participant interacting with Pepper. Right: a diagram of experimental setup showing the participant (green) and the robot (white) positioned face to face. The scene was recorded by cameras (triangles C) from the robot’s perspective focusing on the face of the participant and from the side, showing the whole scene. The experimenter (red) was seated behind a divider.
منابع مشابه
Multimodal Open-Domain Conversations with the Nao Robot
In this paper we discuss the design of human-robot interaction focussing especially on social robot communication and multimodal information presentation. As a starting point we use the WikiTalk application, an open-domain conversational system which has been previously developed using a robotics simulator. We describe how it can be implemented on the Nao robot platform, enabling Nao to make in...
متن کاملDesigning a Multimodal Spoken Component of the Australian National Corpus
Spoken language and interaction lie at the core of human experience. The primary medium of communication is speech, with some estimating the ratio of spoken-written language to be as high as 90%-10% (Cermák, 2009, p. 115). Yet they have remained poor cousins in the building of corpora to date. Not only are spoken corpora much smaller than written corpora (Xiao, 2008), the overwhelming focus in ...
متن کاملAutonomous Acquisition of Natural Language
An important part of human intelligence is the ability to use language. Humans learn how to use language in a society of language users, which is probably the most effective way to learn a language from the ground up. Principles that might allow an artificial agents to learn language this way are not known at present. Here we present a framework which begins to address this challenge. Our auto-...
متن کاملBuilding a Multimodal Human-Robot Interface
However, the situation becomes a bit more complex when we begin to build and interact with machines or robots that either look like humans or have human functionalities and capabilities. Then, people well might interact with their humanlike machines in ways that mimic human– human communication. For example, if a robot has a face, a human might interact with it similarly to how humans interact ...
متن کامل"Look at this!" learning to guide visual saliency in human-robot interaction
We learn to direct computational visual attention in multimodal (i.e., pointing gestures and spoken references) human-robot interaction. For this purpose, we train a conditional random field to integrate features that reflect low-level visual saliency, the likelihood of salient objects, the probability that a given pixel is pointed at, and – if available – spoken information about the target ob...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017